51 research outputs found
A proposed data fusion architecture for micro-zone analysis and data mining
Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture
Building Energy Load Forecasting using Deep Neural Networks
Ensuring sustainability demands more efficient energy management with
minimized energy wastage. Therefore, the power grid of the future should
provide an unprecedented level of flexibility in energy management. To that
end, intelligent decision making requires accurate predictions of future energy
demand/load, both at aggregate and individual site level. Thus, energy load
forecasting have received increased attention in the recent past, however has
proven to be a difficult problem. This paper presents a novel energy load
forecasting methodology based on Deep Neural Networks, specifically Long Short
Term Memory (LSTM) algorithms. The presented work investigates two variants of
the LSTM: 1) standard LSTM and 2) LSTM-based Sequence to Sequence (S2S)
architecture. Both methods were implemented on a benchmark data set of
electricity consumption data from one residential customer. Both architectures
where trained and tested on one hour and one-minute time-step resolution
datasets. Experimental results showed that the standard LSTM failed at
one-minute resolution data while performing well in one-hour resolution data.
It was shown that S2S architecture performed well on both datasets. Further, it
was shown that the presented methods produced comparable results with the other
deep learning methods for energy forecasting in literature
An Adversarial Approach for Explainable AI in Intrusion Detection Systems
Despite the growing popularity of modern machine learning techniques (e.g.
Deep Neural Networks) in cyber-security applications, most of these models are
perceived as a black-box for the user. Adversarial machine learning offers an
approach to increase our understanding of these models. In this paper we
present an approach to generate explanations for incorrect classifications made
by data-driven Intrusion Detection Systems (IDSs). An adversarial approach is
used to find the minimum modifications (of the input features) required to
correctly classify a given set of misclassified samples. The magnitude of such
modifications is used to visualize the most relevant features that explain the
reason for the misclassification. The presented methodology generated
satisfactory explanations that describe the reasoning behind the
mis-classifications, with descriptions that match expert knowledge. The
advantages of the presented methodology are: 1) applicable to any classifier
with defined gradients. 2) does not require any modification of the classifier
model. 3) can be extended to perform further diagnosis (e.g. vulnerability
assessment) and gain further understanding of the system. Experimental
evaluation was conducted on the NSL-KDD99 benchmark dataset using Linear and
Multilayer perceptron classifiers. The results are shown using intuitive
visualizations in order to improve the interpretability of the results
Recommended from our members
DSTiPE Algorithm for Fuzzy Spatio-Temporal Risk Calculation in Wireless Environments
Time and location data play a very significant role in a variety of factory automation scenarios, such as automated vehicles and robots, their navigation, tracking, and monitoring, to services of optimization and security. In addition, pervasive wireless capabilities combined with time and location information are enabling new applications in areas such as transportation systems, health care, elder care, military, emergency response, critical infrastructure, and law enforcement. A person/object in proximity to certain areas for specific durations of time may pose a risk hazard either to themselves, others, or the environment. This paper presents a novel fuzzy based spatio-temporal risk calculation DSTiPE method that an object with wireless communications presents to the environment. The presented Matlab based application for fuzzy spatio-temporal risk cluster extraction is verified on a diagonal vehicle movement example
Improving cyber-security of smart grid systems via anomaly detection and linguistic domain knowledge
The planned large scale deployment of smart grid network devices will generate a large amount of information exchanged over various types of communication networks. The implementation of these critical systems will require appropriate cyber-security measures. A network anomaly detection solution is considered in this work. In common network architectures multiple communications streams are simultaneously present, making it difficult to build an anomaly detection solution for the entire system. In addition, common anomaly detection algorithms require specification of a sensitivity threshold, which inevitably leads to a tradeoff between false positives and false negatives rates. In order to alleviate these issues, this paper proposes a novel anomaly detection architecture. The designed system applies the previously developed network security cyber-sensor method to individual selected communication streams allowing for learning accurate normal network behavior models. Furthermore, the developed system dynamically adjusts the sensitivity threshold of each anomaly detection algorithm based on domain knowledge about the specific network system. It is proposed to model this domain knowledge using Interval Type-2 Fuzzy Logic rules, which linguistically describe the relationship between various features of the network communication and the possibility of a cyber attack. The proposed method was tested on experimental smart grid system demonstrating enhanced cyber-security
Information gain based dimensionality selection for classifying text documents
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods
WESBES: A wireless embedded sensor for improving human comfort metrics using temporospatially correlated data
When utilized properly, energy management systems (EMS) can offer significant energy savings by optimizing the efficiency of heating, ventilation, and air-conditioning (HVAC) systems. However, difficulty often arises due to the constraints imposed by the need to maintain an acceptable level of comfort for a building’s occupants. This challenge is compounded by the fact that human comfort is difficult to define in a measurable way. One way to address this problem is to provide a building manager with direct feedback from the building’s users. Still, this data is relative in nature, making it difficult to determine the actions that need to be taken, and while some useful comfort correlations have been devised, such as ASHRAE’s Predicted Mean Vote index, they are rules of thumb that do not connect individual feedback with direct, diverse feedback sensing. As they are a correlation, quantifying effects of climate, age of buildings and associated defects such as draftiness, are outside the realm of this correlation. Therefore, the contribution of this paper is the Wireless Embedded Smart Block for Environment Sensing (WESBES); an affordable wireless sensor platform that allows subjective human comfort data to be directly paired with temporospatially correlated objective sensor measurements for use in EMS. The described device offers a flexible research platform for analyzing the relationship between objective and subjective occupant feedback in order to formulate more meaningful measures of human comfort. It could also offer an affordable and expandable option for real world deployment in existing EMS
Information Gain Based Dimensionality Selection for Classifying Text Documents
Abstract-Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods
- …